Using a small development set to build a robust dialectal Chinese speech recognizer
نویسندگان
چکیده
To make full use of a small development data set to build a robust dialectal Chinese speech recognizer from a standard Chinese speech recognizer (based on Chinese Initial/Final, IF), a novel, simple but effective acoustic modeling method, named state-dependent phoneme-based model merging (SDPBMM), is proposed and evaluated, where a shared-state of standard tri-IF is merged with a state of dialectal mono-IF in terms of pronunciation variation modeling. Specifically, in order to deal with phonetic-level pronunciation variations in SDPBMM, distance-based pronunciation modeling is proposed based on a small dialectal Chinese data set. With a 40-minute Shanghai-dialectal Chinese data set, SDPBMM can achieve a significant syllable error rate (SER) reduction of 14.3% for dialectal Chinese with almost no performance degradation for standard Chinese. Experimentally, SDPBMM can also outperform the maximum likelihood linear regression (MLLR) adaptation and the pooled retraining methods with relative SER reductions by 2.8% and 10.6%, respectively. If SDPBMM is combined with the MLLR adaptation, another relative SER reduction of 3.3% can be further achieved.
منابع مشابه
State-Dependent Phoneme-Based Model Merging for Dialectal Chinese Speech Recognition
Aiming at building a dialectal Chinese speech recognizer from a standard Chinese speech recognizer with a small amount of dialectal Chinese speech, a novel, simple but effective acoustic modeling method, named statedependent phoneme-based model merging (SDPBMM) method, is proposed and evaluated, where a tied-state of standard triphone(s) will be merged with a state of the dialectal monophone th...
متن کاملUsing English Phoneme Models for Chinese Speech Recognition
To build a speech recognizer, database design, collection and transcription is the most time consuming and tedious job. This paper proposes some fast and easy methods to use English phoneme models for Mandarin and Cantonese speech recognition with little to no training data in Mandarin and Cantonese. While a recognizer built with such transformed models might not perform as ideally as one that ...
متن کاملمدل میکروسکوپی دوگوشی مبتنی بر فیلتر بانک مدولاسیون برای پیش گویی قابلیت فهم گفتار در افراد دارای شنوایی عادی
In this study, a binaural microscopic model for the prediction of speech intelligibility based on the modulation filter bank is introduced. So far, the spectral criteria such as the STI and SII or other analytical methods have been used in the binaural models to determine the binaural intelligibility. In the proposed model, unlike all models of binaural intelligibility prediction, an automatic ...
متن کاملDevelopment of a Speech Recognizer for the Dutch Language
This paper describes the development of a large vocabulary speaker independent speech recognizer for the Dutch language. The recognizer was build using Hidden Markov Toolkit and the Polyphone database of recorded Dutch speech. A number of systems have been build ranging from a simple monophone recognizer to a sophisticated system that uses backed-off triphones. The system has been tested using ...
متن کاملNarrow Phonetic Transcription for Development of a Large Vocabulary Isolated Word Recognizer
To build a very large vocabulary (50K) isolated word speech recognizer, speech data from over 200 native speakers of American English was recorded and manually transcribed. This paper explains the transcription method used, the motivation, and preliminary results implemented in the recognizer. The symbol set was expanded to allow for narrow phonetic transcriptions, similar to the level of detai...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2007